ICE: Idiom and Collocation Extractor for Research and Education
نویسندگان
چکیده
Collocation and idiom extraction are wellknown challenges with many potential applications in Natural Language Processing (NLP). Our experimental, open-source software system, called ICE, is a python package for flexibly extracting collocations and idioms, currently in English. It also has a competitive POS tagger that can be used alone or as part of collocation/idiom extraction. ICE is available free of cost for research and educational uses in two user-friendly formats. This paper gives an overview of ICE and its performance, and briefly describes the research underlying the extraction algorithms.
منابع مشابه
Tools for Collocation Extraction: Preferences for Active vs. Passive
We present and partially evaluate procedures for the extraction of noun+verb collocation candidates from German text corpora, along with their morphosyntactic preferences, especially for the active vs. passive voice. We start from tokenized, tagged, lemmatized and chunked text, and we use extraction patterns formulated in the CQP corpus query language. We discuss the results of a precision eval...
متن کاملThe Performance of Iranian EFL Learners in Producing and Recognizing Idiom-Containing Sentences
This study aimed to investigate how Iranian EFL learners performed in producing sentences containing idioms and whether they had any problems in producing such sentences. This query, subsequently, raised the question of whether idioms influenced the participants’ grammaticality judgment on idiom-containing sentences. For this purpose, firstly, the writings of 24 learners were investigated for a...
متن کاملBilingual Collocation Extraction Based on Syntactic and Statistical Analyses
In this paper, we describe an algorithm that employs syntactic and statistical analysis to extract bilingual collocations from a parallel corpus. The preferred syntactic patterns are obtained from idioms and collocations in a machine-readable dictionary. Phrases matching the patterns are extract from aligned sentences in a parallel corpus. Those phrases are subsequently matched up via cross-lin...
متن کاملThe Verb in the Terminological Collocations. Contribution to the Development of a Morphological Analyser: MorphoCom
Considering that we are observing and describing the behaviour of the terminological units and the terminological collocations, we intend to talk about the value of the verb as a nuclear element of the terminological collocation in the Portuguese language. So we will empathize the theoretical distinction between multilexemic terminological unit and terminological collocation and the importance ...
متن کاملThe Comparative Effect of Using Idioms in Conversation and Paragraph Writing on EFL Learners’ Idiom Learning
This study investigated the comparative effect of teaching idiomatic expressions through practicing them in conversation and paragraph writing on intermediate EFL learners’ idiom learning. The participants were sorted out of a population of 134 intermediate students in Zabansara Language School in Khorramabad based on their scores on a Preliminary English Test (PET) and an idiom test piloted in...
متن کامل